Constructing a Text Generation Model¶

Using most of the techniques you've already learned, it's now possible to generate new text by predicting the next word that follows a given seed word. To practice this method, we'll use the Kaggle Song Lyrics Dataset.

Import TensorFlow and related functions¶

import tensorflow as tf

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Other imports for processing data
import string
import numpy as np
import pandas as pd

Get the Dataset¶

As noted above, we'll utilize the Song Lyrics dataset on Kaggle.

!wget --no-check-certificate \
    https://drive.google.com/uc?id=1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8 \
    -O /tmp/songdata.csv

--2020-08-09 03:38:04--  https://drive.google.com/uc?id=1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8
Resolving drive.google.com (drive.google.com)... 74.125.203.102, 74.125.203.100, 74.125.203.113, ...
Connecting to drive.google.com (drive.google.com)|74.125.203.102|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-04-ak-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/knnc0v7obic9cfo7di4dtovne8vb8cr9/1596944250000/11118900490791463723/*/1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8 [following]
Warning: wildcards not supported in HTTP.
--2020-08-09 03:38:07--  https://doc-04-ak-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/knnc0v7obic9cfo7di4dtovne8vb8cr9/1596944250000/11118900490791463723/*/1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8
Resolving doc-04-ak-docs.googleusercontent.com (doc-04-ak-docs.googleusercontent.com)... 74.125.203.132, 2404:6800:4008:c03::84
Connecting to doc-04-ak-docs.googleusercontent.com (doc-04-ak-docs.googleusercontent.com)|74.125.203.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘/tmp/songdata.csv’

/tmp/songdata.csv       [     <=>            ]  69.08M  56.9MB/s    in 1.2s    

2020-08-09 03:38:09 (56.9 MB/s) - ‘/tmp/songdata.csv’ saved [72436445]

First 10 Songs¶

Let's first look at just 10 songs from the dataset, and see how things perform.

Preprocessing¶

Let's perform some basic preprocessing to get rid of punctuation and make everything lowercase. We'll then split the lyrics up by line and tokenize the lyrics.

def tokenize_corpus(corpus, num_words=-1):
  # Fit a Tokenizer on the corpus
  if num_words > -1:
    tokenizer = Tokenizer(num_words=num_words)
  else:
    tokenizer = Tokenizer()
  tokenizer.fit_on_texts(corpus)
  return tokenizer

def create_lyrics_corpus(dataset, field):
  # Remove all other punctuation
  dataset[field] = dataset[field].str.replace('[{}]'.format(string.punctuation), '')
  # Make it lowercase
  dataset[field] = dataset[field].str.lower()
  # Make it one long string to split by line
  lyrics = dataset[field].str.cat()
  corpus = lyrics.split('\n')
  # Remove any trailing whitespace
  for l in range(len(corpus)):
    corpus[l] = corpus[l].rstrip()
  # Remove any empty lines
  corpus = [l for l in corpus if l != '']

  return corpus

# Read the dataset from csv - just first 10 songs for now
dataset = pd.read_csv('/tmp/songdata.csv', dtype=str)[:10]
# Create the corpus using the 'text' column containing lyrics
corpus = create_lyrics_corpus(dataset, 'text')
# Tokenize the corpus
tokenizer = tokenize_corpus(corpus)

total_words = len(tokenizer.word_index) + 1

print(tokenizer.word_index)
print(total_words)

{'you': 1, 'i': 2, 'and': 3, 'a': 4, 'me': 5, 'the': 6, 'is': 7, 'my': 8, 'to': 9, 'ma': 10, 'it': 11, 'of': 12, 'im': 13, 'your': 14, 'love': 15, 'so': 16, 'as': 17, 'that': 18, 'in': 19, 'andante': 20, 'boomaboomerang': 21, 'make': 22, 'on': 23, 'oh': 24, 'for': 25, 'but': 26, 'new': 27, 'bang': 28, 'its': 29, 'be': 30, 'like': 31, 'know': 32, 'now': 33, 'how': 34, 'could': 35, 'youre': 36, 'sing': 37, 'never': 38, 'no': 39, 'chiquitita': 40, 'can': 41, 'we': 42, 'song': 43, 'had': 44, 'good': 45, 'youll': 46, 'she': 47, 'just': 48, 'girl': 49, 'again': 50, 'will': 51, 'take': 52, 'please': 53, 'let': 54, 'am': 55, 'eyes': 56, 'was': 57, 'always': 58, 'cassandra': 59, 'blue': 60, 'time': 61, 'dont': 62, 'were': 63, 'return': 64, 'once': 65, 'then': 66, 'sorry': 67, 'cryin': 68, 'over': 69, 'feel': 70, 'ever': 71, 'believe': 72, 'what': 73, 'do': 74, 'go': 75, 'all': 76, 'out': 77, 'think': 78, 'every': 79, 'leave': 80, 'look': 81, 'at': 82, 'way': 83, 'one': 84, 'music': 85, 'down': 86, 'our': 87, 'give': 88, 'learn': 89, 'more': 90, 'us': 91, 'would': 92, 'there': 93, 'before': 94, 'when': 95, 'with': 96, 'feeling': 97, 'play': 98, 'cause': 99, 'away': 100, 'here': 101, 'have': 102, 'yes': 103, 'baby': 104, 'get': 105, 'didnt': 106, 'see': 107, 'did': 108, 'closed': 109, 'realized': 110, 'crazy': 111, 'world': 112, 'lord': 113, 'shes': 114, 'kind': 115, 'without': 116, 'if': 117, 'touch': 118, 'strong': 119, 'making': 120, 'such': 121, 'found': 122, 'true': 123, 'stay': 124, 'together': 125, 'thought': 126, 'come': 127, 'they': 128, 'sweet': 129, 'tender': 130, 'sender': 131, 'tune': 132, 'humdehumhum': 133, 'gonna': 134, 'last': 135, 'leaving': 136, 'sleep': 137, 'only': 138, 'saw': 139, 'tell': 140, 'hes': 141, 'her': 142, 'sound': 143, 'tread': 144, 'lightly': 145, 'ground': 146, 'ill': 147, 'show': 148, 'life': 149, 'too': 150, 'used': 151, 'darling': 152, 'meant': 153, 'break': 154, 'end': 155, 'yourself': 156, 'little': 157, 'dumbedumdum': 158, 'bedumbedumdum': 159, 'youve': 160, 'dumbbedumbdumb': 161, 'bedumbbedumbdumb': 162, 'by': 163, 'theyre': 164, 'alone': 165, 'misunderstood': 166, 'day': 167, 'dawning': 168, 'some': 169, 'wanted': 170, 'none': 171, 'listen': 172, 'words': 173, 'warning': 174, 'darkest': 175, 'nights': 176, 'nobody': 177, 'knew': 178, 'fight': 179, 'caught': 180, 'really': 181, 'power': 182, 'dreams': 183, 'weave': 184, 'until': 185, 'final': 186, 'hour': 187, 'morning': 188, 'ship': 189, 'gone': 190, 'grieving': 191, 'still': 192, 'pain': 193, 'cry': 194, 'sun': 195, 'try': 196, 'face': 197, 'something': 198, 'sees': 199, 'makes': 200, 'fine': 201, 'who': 202, 'mine': 203, 'leaves': 204, 'walk': 205, 'hand': 206, 'well': 207, 'about': 208, 'things': 209, 'slow': 210, 'theres': 211, 'talk': 212, 'why': 213, 'up': 214, 'lousy': 215, 'packing': 216, 'ive': 217, 'gotta': 218, 'near': 219, 'keeping': 220, 'intention': 221, 'growing': 222, 'taking': 223, 'dimension': 224, 'even': 225, 'better': 226, 'thank': 227, 'god': 228, 'not': 229, 'somebody': 230, 'happy': 231, 'question': 232, 'smile': 233, 'mean': 234, 'much': 235, 'kisses': 236, 'around': 237, 'anywhere': 238, 'advice': 239, 'care': 240, 'use': 241, 'selfish': 242, 'tool': 243, 'fool': 244, 'showing': 245, 'boomerang': 246, 'throwing': 247, 'warm': 248, 'kiss': 249, 'surrender': 250, 'giving': 251, 'been': 252, 'door': 253, 'burning': 254, 'bridges': 255, 'being': 256, 'moving': 257, 'though': 258, 'behind': 259, 'are': 260, 'must': 261, 'sure': 262, 'stood': 263, 'hope': 264, 'this': 265, 'deny': 266, 'sad': 267, 'quiet': 268, 'truth': 269, 'heartaches': 270, 'scars': 271, 'dancing': 272, 'sky': 273, 'shining': 274, 'above': 275, 'hear': 276, 'came': 277, 'couldnt': 278, 'everything': 279, 'back': 280, 'long': 281, 'waitin': 282, 'cold': 283, 'chills': 284, 'bone': 285, 'youd': 286, 'wonderful': 287, 'means': 288, 'special': 289, 'smiles': 290, 'lucky': 291, 'fellow': 292, 'park': 293, 'holds': 294, 'squeezes': 295, 'walking': 296, 'hours': 297, 'talking': 298, 'plan': 299, 'easy': 300, 'gently': 301, 'summer': 302, 'evening': 303, 'breeze': 304, 'grow': 305, 'fingers': 306, 'soft': 307, 'light': 308, 'body': 309, 'velvet': 310, 'night': 311, 'soul': 312, 'slowly': 313, 'shimmer': 314, 'thousand': 315, 'butterflies': 316, 'float': 317, 'put': 318, 'rotten': 319, 'boy': 320, 'tough': 321, 'stuff': 322, 'saying': 323, 'need': 324, 'anymore': 325, 'enough': 326, 'standing': 327, 'creep': 328, 'felt': 329, 'cheap': 330, 'notion': 331, 'deep': 332, 'dumb': 333, 'mistake': 334, 'entitled': 335, 'another': 336, 'beg': 337, 'forgive': 338, 'an': 339, 'feels': 340, 'hoot': 341, 'holler': 342, 'mad': 343, 'under': 344, 'heel': 345, 'holy': 346, 'christ': 347, 'deal': 348, 'sick': 349, 'tired': 350, 'tedious': 351, 'ways': 352, 'aint': 353, 'walkin': 354, 'cutting': 355, 'tie': 356, 'wanna': 357, 'into': 358, 'eye': 359, 'myself': 360, 'counting': 361, 'pride': 362, 'unright': 363, 'neighbours': 364, 'ride': 365, 'burying': 366, 'past': 367, 'peace': 368, 'free': 369, 'sucker': 370, 'street': 371, 'singing': 372, 'shouting': 373, 'staying': 374, 'alive': 375, 'city': 376, 'dead': 377, 'hiding': 378, 'their': 379, 'shame': 380, 'hollow': 381, 'laughter': 382, 'while': 383, 'crying': 384, 'bed': 385, 'pity': 386, 'believed': 387, 'lost': 388, 'from': 389, 'start': 390, 'suffer': 391, 'sell': 392, 'secrets': 393, 'bargain': 394, 'playing': 395, 'smart': 396, 'aching': 397, 'hearts': 398, 'sailing': 399, 'father': 400, 'sister': 401, 'reason': 402, 'linger': 403, 'deeply': 404, 'future': 405, 'casting': 406, 'shadow': 407, 'else': 408, 'fate': 409, 'bags': 410, 'thorough': 411, 'knowing': 412, 'late': 413, 'wait': 414, 'watched': 415, 'harbor': 416, 'sunrise': 417, 'sails': 418, 'almost': 419, 'slack': 420, 'cool': 421, 'rain': 422, 'deck': 423, 'tiny': 424, 'figure': 425, 'rigid': 426, 'restrained': 427, 'filled': 428, 'whats': 429, 'wrong': 430, 'enchained': 431, 'own': 432, 'sorrow': 433, 'tomorrow': 434, 'hate': 435, 'shoulder': 436, 'best': 437, 'friend': 438, 'rely': 439, 'broken': 440, 'feather': 441, 'patch': 442, 'walls': 443, 'tumbling': 444, 'loves': 445, 'blown': 446, 'candle': 447, 'seems': 448, 'hard': 449, 'handle': 450, 'id': 451, 'thinking': 452, 'went': 453, 'house': 454, 'hardly': 455, 'guy': 456, 'closing': 457, 'front': 458, 'emptiness': 459, 'he': 460, 'disapeared': 461, 'his': 462, 'car': 463, 'stunned': 464, 'dreamed': 465, 'lifes': 466, 'part': 467, 'move': 468, 'feet': 469, 'pavement': 470, 'acted': 471, 'told': 472, 'lies': 473, 'meet': 474, 'other': 475, 'guys': 476, 'stupid': 477, 'blind': 478, 'smiled': 479, 'took': 480, 'said': 481, 'may': 482, 'couple': 483, 'men': 484, 'them': 485, 'brother': 486, 'joe': 487, 'seeing': 488, 'lot': 489, 'him': 490, 'nice': 491, 'sitting': 492, 'sittin': 493, 'memories': 494}
495

Create Sequences and Labels¶

After preprocessing, we next need to create sequences and labels. Creating the sequences themselves is similar to before with texts_to_sequences, but also including the use of N-Grams; creating the labels will now utilize those sequences as well as utilize one-hot encoding over all potential output words.

sequences = []
for line in corpus:
	token_list = tokenizer.texts_to_sequences([line])[0]
	for i in range(1, len(token_list)):
		n_gram_sequence = token_list[:i+1]
		sequences.append(n_gram_sequence)

# Pad sequences for equal input length 
max_sequence_len = max([len(seq) for seq in sequences])
sequences = np.array(pad_sequences(sequences, maxlen=max_sequence_len, padding='pre'))

# Split sequences between the "input" sequence and "output" predicted word
input_sequences, labels = sequences[:,:-1], sequences[:,-1]
# One-hot encode the labels
one_hot_labels = tf.keras.utils.to_categorical(labels, num_classes=total_words)

# Check out how some of our data is being stored
# The Tokenizer has just a single index per word
print(tokenizer.word_index['know'])
print(tokenizer.word_index['feeling'])
# Input sequences will have multiple indexes
print(input_sequences[5])
print(input_sequences[6])
# And the one hot labels will be as long as the full spread of tokenized words
print(one_hot_labels[5])
print(one_hot_labels[6])

32
97
[  0   0   0   0   0   0   0   0   0   0   0   0   0  81  82 142 197  29
   4]
[  0   0   0   0   0   0   0   0   0   0   0   0  81  82 142 197  29   4
 287]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Train a Text Generation Model¶

Building an RNN to train our text generation model will be very similar to the sentiment models you've built previously. The only real change necessary is to make sure to use Categorical instead of Binary Cross Entropy as the loss function - we could use Binary before since the sentiment was only 0 or 1, but now there are hundreds of categories.

From there, we should also consider using more epochs than before, as text generation can take a little longer to converge than sentiment analysis, and we aren't working with all that much data yet. I'll set it at 200 epochs here since we're only use part of the dataset, and training will tail off quite a bit over that many epochs.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional

model = Sequential()
model.add(Embedding(total_words, 64, input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(20)))
model.add(Dense(total_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(input_sequences, one_hot_labels, epochs=200, verbose=1)

Epoch 1/200
62/62 [==============================] - 1s 12ms/step - loss: 5.9724 - accuracy: 0.0242
Epoch 2/200
62/62 [==============================] - 1s 12ms/step - loss: 5.4292 - accuracy: 0.0394
Epoch 3/200
62/62 [==============================] - 1s 12ms/step - loss: 5.3681 - accuracy: 0.0399
Epoch 4/200
62/62 [==============================] - 1s 12ms/step - loss: 5.3081 - accuracy: 0.0404
Epoch 5/200
62/62 [==============================] - 1s 12ms/step - loss: 5.2372 - accuracy: 0.0424
Epoch 6/200
62/62 [==============================] - 1s 12ms/step - loss: 5.1663 - accuracy: 0.0449
Epoch 7/200
62/62 [==============================] - 1s 12ms/step - loss: 5.1005 - accuracy: 0.0484
Epoch 8/200
62/62 [==============================] - 1s 12ms/step - loss: 5.0319 - accuracy: 0.0520
Epoch 9/200
62/62 [==============================] - 1s 12ms/step - loss: 4.9532 - accuracy: 0.0681
Epoch 10/200
62/62 [==============================] - 1s 12ms/step - loss: 4.8623 - accuracy: 0.0858
Epoch 11/200
62/62 [==============================] - 1s 12ms/step - loss: 4.7598 - accuracy: 0.0943
Epoch 12/200
62/62 [==============================] - 1s 12ms/step - loss: 4.6497 - accuracy: 0.1070
Epoch 13/200
62/62 [==============================] - 1s 12ms/step - loss: 4.5489 - accuracy: 0.1201
Epoch 14/200
62/62 [==============================] - 1s 12ms/step - loss: 4.4575 - accuracy: 0.1256
Epoch 15/200
62/62 [==============================] - 1s 12ms/step - loss: 4.3593 - accuracy: 0.1327
Epoch 16/200
62/62 [==============================] - 1s 12ms/step - loss: 4.2700 - accuracy: 0.1524
Epoch 17/200
62/62 [==============================] - 1s 12ms/step - loss: 4.1736 - accuracy: 0.1549
Epoch 18/200
62/62 [==============================] - 1s 12ms/step - loss: 4.0863 - accuracy: 0.1690
Epoch 19/200
62/62 [==============================] - 1s 12ms/step - loss: 4.0035 - accuracy: 0.1826
Epoch 20/200
62/62 [==============================] - 1s 12ms/step - loss: 3.9241 - accuracy: 0.1922
Epoch 21/200
62/62 [==============================] - 1s 12ms/step - loss: 3.8481 - accuracy: 0.2074
Epoch 22/200
62/62 [==============================] - 1s 12ms/step - loss: 3.7718 - accuracy: 0.2154
Epoch 23/200
62/62 [==============================] - 1s 12ms/step - loss: 3.6924 - accuracy: 0.2371
Epoch 24/200
62/62 [==============================] - 1s 12ms/step - loss: 3.6175 - accuracy: 0.2523
Epoch 25/200
62/62 [==============================] - 1s 12ms/step - loss: 3.5545 - accuracy: 0.2659
Epoch 26/200
62/62 [==============================] - 1s 12ms/step - loss: 3.4843 - accuracy: 0.2785
Epoch 27/200
62/62 [==============================] - 1s 12ms/step - loss: 3.4280 - accuracy: 0.2856
Epoch 28/200
62/62 [==============================] - 1s 12ms/step - loss: 3.3448 - accuracy: 0.3158
Epoch 29/200
62/62 [==============================] - 1s 12ms/step - loss: 3.2783 - accuracy: 0.3204
Epoch 30/200
62/62 [==============================] - 1s 12ms/step - loss: 3.2234 - accuracy: 0.3305
Epoch 31/200
62/62 [==============================] - 1s 12ms/step - loss: 3.1559 - accuracy: 0.3380
Epoch 32/200
62/62 [==============================] - 1s 12ms/step - loss: 3.1083 - accuracy: 0.3532
Epoch 33/200
62/62 [==============================] - 1s 12ms/step - loss: 3.0483 - accuracy: 0.3602
Epoch 34/200
62/62 [==============================] - 1s 12ms/step - loss: 2.9765 - accuracy: 0.3870
Epoch 35/200
62/62 [==============================] - 1s 12ms/step - loss: 2.9178 - accuracy: 0.3915
Epoch 36/200
62/62 [==============================] - 1s 12ms/step - loss: 2.8648 - accuracy: 0.4011
Epoch 37/200
62/62 [==============================] - 1s 12ms/step - loss: 2.8182 - accuracy: 0.4117
Epoch 38/200
62/62 [==============================] - 1s 12ms/step - loss: 2.7906 - accuracy: 0.4198
Epoch 39/200
62/62 [==============================] - 1s 12ms/step - loss: 2.7176 - accuracy: 0.4374
Epoch 40/200
62/62 [==============================] - 1s 12ms/step - loss: 2.6564 - accuracy: 0.4480
Epoch 41/200
62/62 [==============================] - 1s 12ms/step - loss: 2.6083 - accuracy: 0.4536
Epoch 42/200
62/62 [==============================] - 1s 12ms/step - loss: 2.5767 - accuracy: 0.4652
Epoch 43/200
62/62 [==============================] - 1s 12ms/step - loss: 2.5161 - accuracy: 0.4743
Epoch 44/200
62/62 [==============================] - 1s 12ms/step - loss: 2.4689 - accuracy: 0.4854
Epoch 45/200
62/62 [==============================] - 1s 12ms/step - loss: 2.4165 - accuracy: 0.4919
Epoch 46/200
62/62 [==============================] - 1s 12ms/step - loss: 2.3789 - accuracy: 0.4980
Epoch 47/200
62/62 [==============================] - 1s 12ms/step - loss: 2.3292 - accuracy: 0.5050
Epoch 48/200
62/62 [==============================] - 1s 12ms/step - loss: 2.3068 - accuracy: 0.5111
Epoch 49/200
62/62 [==============================] - 1s 12ms/step - loss: 2.2623 - accuracy: 0.5227
Epoch 50/200
62/62 [==============================] - 1s 12ms/step - loss: 2.2117 - accuracy: 0.5328
Epoch 51/200
62/62 [==============================] - 1s 12ms/step - loss: 2.1699 - accuracy: 0.5348
Epoch 52/200
62/62 [==============================] - 1s 12ms/step - loss: 2.1316 - accuracy: 0.5515
Epoch 53/200
62/62 [==============================] - 1s 12ms/step - loss: 2.0936 - accuracy: 0.5590
Epoch 54/200
62/62 [==============================] - 1s 12ms/step - loss: 2.0573 - accuracy: 0.5651
Epoch 55/200
62/62 [==============================] - 1s 12ms/step - loss: 2.0290 - accuracy: 0.5827
Epoch 56/200
62/62 [==============================] - 1s 12ms/step - loss: 2.0133 - accuracy: 0.5767
Epoch 57/200
62/62 [==============================] - 1s 12ms/step - loss: 1.9877 - accuracy: 0.5918
Epoch 58/200
62/62 [==============================] - 1s 12ms/step - loss: 1.9610 - accuracy: 0.5898
Epoch 59/200
62/62 [==============================] - 1s 12ms/step - loss: 1.9115 - accuracy: 0.5984
Epoch 60/200
62/62 [==============================] - 1s 12ms/step - loss: 1.8776 - accuracy: 0.6039
Epoch 61/200
62/62 [==============================] - 1s 12ms/step - loss: 1.8356 - accuracy: 0.6160
Epoch 62/200
62/62 [==============================] - 1s 12ms/step - loss: 1.8023 - accuracy: 0.6206
Epoch 63/200
62/62 [==============================] - 1s 12ms/step - loss: 1.7700 - accuracy: 0.6282
Epoch 64/200
62/62 [==============================] - 1s 12ms/step - loss: 1.7376 - accuracy: 0.6367
Epoch 65/200
62/62 [==============================] - 1s 12ms/step - loss: 1.7119 - accuracy: 0.6418
Epoch 66/200
62/62 [==============================] - 1s 12ms/step - loss: 1.6993 - accuracy: 0.6478
Epoch 67/200
62/62 [==============================] - 1s 12ms/step - loss: 1.6875 - accuracy: 0.6453
Epoch 68/200
62/62 [==============================] - 1s 12ms/step - loss: 1.6407 - accuracy: 0.6514
Epoch 69/200
62/62 [==============================] - 1s 12ms/step - loss: 1.6228 - accuracy: 0.6635
Epoch 70/200
62/62 [==============================] - 1s 12ms/step - loss: 1.5839 - accuracy: 0.6710
Epoch 71/200
62/62 [==============================] - 1s 12ms/step - loss: 1.5651 - accuracy: 0.6705
Epoch 72/200
62/62 [==============================] - 1s 12ms/step - loss: 1.5358 - accuracy: 0.6776
Epoch 73/200
62/62 [==============================] - 1s 12ms/step - loss: 1.4952 - accuracy: 0.6867
Epoch 74/200
62/62 [==============================] - 1s 12ms/step - loss: 1.4604 - accuracy: 0.6958
Epoch 75/200
62/62 [==============================] - 1s 12ms/step - loss: 1.4427 - accuracy: 0.6988
Epoch 76/200
62/62 [==============================] - 1s 12ms/step - loss: 1.4148 - accuracy: 0.7149
Epoch 77/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3930 - accuracy: 0.7170
Epoch 78/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3732 - accuracy: 0.7190
Epoch 79/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3470 - accuracy: 0.7245
Epoch 80/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3257 - accuracy: 0.7316
Epoch 81/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3140 - accuracy: 0.7331
Epoch 82/200
62/62 [==============================] - 1s 12ms/step - loss: 1.3094 - accuracy: 0.7311
Epoch 83/200
62/62 [==============================] - 1s 12ms/step - loss: 1.2939 - accuracy: 0.7306
Epoch 84/200
62/62 [==============================] - 1s 12ms/step - loss: 1.2621 - accuracy: 0.7422
Epoch 85/200
62/62 [==============================] - 1s 12ms/step - loss: 1.2546 - accuracy: 0.7472
Epoch 86/200
62/62 [==============================] - 1s 12ms/step - loss: 1.2130 - accuracy: 0.7533
Epoch 87/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1923 - accuracy: 0.7608
Epoch 88/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1650 - accuracy: 0.7659
Epoch 89/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1537 - accuracy: 0.7694
Epoch 90/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1534 - accuracy: 0.7619
Epoch 91/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1250 - accuracy: 0.7770
Epoch 92/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1160 - accuracy: 0.7770
Epoch 93/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0920 - accuracy: 0.7790
Epoch 94/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0797 - accuracy: 0.7825
Epoch 95/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0744 - accuracy: 0.7810
Epoch 96/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0604 - accuracy: 0.7820
Epoch 97/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1200 - accuracy: 0.7634
Epoch 98/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1580 - accuracy: 0.7548
Epoch 99/200
62/62 [==============================] - 1s 12ms/step - loss: 1.1155 - accuracy: 0.7654
Epoch 100/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0531 - accuracy: 0.7881
Epoch 101/200
62/62 [==============================] - 1s 12ms/step - loss: 1.0029 - accuracy: 0.7962
Epoch 102/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9886 - accuracy: 0.7997
Epoch 103/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9667 - accuracy: 0.8037
Epoch 104/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9546 - accuracy: 0.8113
Epoch 105/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9346 - accuracy: 0.8174
Epoch 106/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9238 - accuracy: 0.8148
Epoch 107/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9068 - accuracy: 0.8179
Epoch 108/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8920 - accuracy: 0.8194
Epoch 109/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8814 - accuracy: 0.8280
Epoch 110/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8691 - accuracy: 0.8214
Epoch 111/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8644 - accuracy: 0.8290
Epoch 112/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8514 - accuracy: 0.8234
Epoch 113/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8606 - accuracy: 0.8224
Epoch 114/200
62/62 [==============================] - 1s 12ms/step - loss: 0.9316 - accuracy: 0.7982
Epoch 115/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8746 - accuracy: 0.8174
Epoch 116/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8515 - accuracy: 0.8184
Epoch 117/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8235 - accuracy: 0.8219
Epoch 118/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8081 - accuracy: 0.8310
Epoch 119/200
62/62 [==============================] - 1s 12ms/step - loss: 0.8113 - accuracy: 0.8345
Epoch 120/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7916 - accuracy: 0.8375
Epoch 121/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7771 - accuracy: 0.8446
Epoch 122/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7633 - accuracy: 0.8406
Epoch 123/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7582 - accuracy: 0.8370
Epoch 124/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7574 - accuracy: 0.8416
Epoch 125/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7398 - accuracy: 0.8486
Epoch 126/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7237 - accuracy: 0.8507
Epoch 127/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7091 - accuracy: 0.8502
Epoch 128/200
62/62 [==============================] - 1s 12ms/step - loss: 0.7094 - accuracy: 0.8522
Epoch 129/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6952 - accuracy: 0.8562
Epoch 130/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6905 - accuracy: 0.8592
Epoch 131/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6781 - accuracy: 0.8557
Epoch 132/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6728 - accuracy: 0.8557
Epoch 133/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6657 - accuracy: 0.8587
Epoch 134/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6545 - accuracy: 0.8602
Epoch 135/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6460 - accuracy: 0.8673
Epoch 136/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6413 - accuracy: 0.8613
Epoch 137/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6381 - accuracy: 0.8663
Epoch 138/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6286 - accuracy: 0.8628
Epoch 139/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6525 - accuracy: 0.8562
Epoch 140/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6521 - accuracy: 0.8562
Epoch 141/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6365 - accuracy: 0.8592
Epoch 142/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6118 - accuracy: 0.8658
Epoch 143/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6262 - accuracy: 0.8602
Epoch 144/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6330 - accuracy: 0.8577
Epoch 145/200
62/62 [==============================] - 1s 12ms/step - loss: 0.6116 - accuracy: 0.8653
Epoch 146/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5963 - accuracy: 0.8668
Epoch 147/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5905 - accuracy: 0.8713
Epoch 148/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5796 - accuracy: 0.8729
Epoch 149/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5802 - accuracy: 0.8713
Epoch 150/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5791 - accuracy: 0.8729
Epoch 151/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5686 - accuracy: 0.8744
Epoch 152/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5535 - accuracy: 0.8794
Epoch 153/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5459 - accuracy: 0.8779
Epoch 154/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5383 - accuracy: 0.8769
Epoch 155/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5316 - accuracy: 0.8804
Epoch 156/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5265 - accuracy: 0.8829
Epoch 157/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5200 - accuracy: 0.8779
Epoch 158/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5170 - accuracy: 0.8809
Epoch 159/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5527 - accuracy: 0.8698
Epoch 160/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5365 - accuracy: 0.8749
Epoch 161/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5134 - accuracy: 0.8850
Epoch 162/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5071 - accuracy: 0.8789
Epoch 163/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5030 - accuracy: 0.8774
Epoch 164/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5444 - accuracy: 0.8749
Epoch 165/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5229 - accuracy: 0.8804
Epoch 166/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5140 - accuracy: 0.8784
Epoch 167/200
62/62 [==============================] - 1s 12ms/step - loss: 0.5084 - accuracy: 0.8759
Epoch 168/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4933 - accuracy: 0.8789
Epoch 169/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4867 - accuracy: 0.8829
Epoch 170/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4805 - accuracy: 0.8845
Epoch 171/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4684 - accuracy: 0.8860
Epoch 172/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4813 - accuracy: 0.8880
Epoch 173/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4685 - accuracy: 0.8840
Epoch 174/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4555 - accuracy: 0.8895
Epoch 175/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4523 - accuracy: 0.8865
Epoch 176/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4599 - accuracy: 0.8865
Epoch 177/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4589 - accuracy: 0.8865
Epoch 178/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4471 - accuracy: 0.8875
Epoch 179/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4813 - accuracy: 0.8794
Epoch 180/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4968 - accuracy: 0.8764
Epoch 181/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4731 - accuracy: 0.8774
Epoch 182/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4611 - accuracy: 0.8819
Epoch 183/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4384 - accuracy: 0.8865
Epoch 184/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4295 - accuracy: 0.8930
Epoch 185/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4191 - accuracy: 0.8935
Epoch 186/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4165 - accuracy: 0.8905
Epoch 187/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4147 - accuracy: 0.8930
Epoch 188/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4408 - accuracy: 0.8850
Epoch 189/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4187 - accuracy: 0.8905
Epoch 190/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4117 - accuracy: 0.8920
Epoch 191/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4080 - accuracy: 0.8961
Epoch 192/200
62/62 [==============================] - 1s 12ms/step - loss: 0.4059 - accuracy: 0.8935
Epoch 193/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3991 - accuracy: 0.8961
Epoch 194/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3939 - accuracy: 0.8971
Epoch 195/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3893 - accuracy: 0.8951
Epoch 196/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3852 - accuracy: 0.9011
Epoch 197/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3817 - accuracy: 0.9011
Epoch 198/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3774 - accuracy: 0.8991
Epoch 199/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3776 - accuracy: 0.9041
Epoch 200/200
62/62 [==============================] - 1s 12ms/step - loss: 0.3785 - accuracy: 0.8966

View the Training Graph¶

import matplotlib.pyplot as plt

def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.show()

plot_graphs(history, 'accuracy')

Generate new lyrics!¶

It's finally time to generate some new lyrics from the trained model, and see what we get. To do so, we'll provide some "seed text", or an input sequence for the model to start with. We'll also decide just how long of an output sequence we want - this could essentially be infinite, as the input plus the previous output will be continuously fed in for a new output word (at least up to our max sequence length).

seed_text = "im feeling chills"
next_words = 100
  
for _ in range(next_words):
	token_list = tokenizer.texts_to_sequences([seed_text])[0]
	token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
	predicted = np.argmax(model.predict(token_list), axis=-1)
	output_word = ""
	for word, index in tokenizer.word_index.items():
		if index == predicted:
			output_word = word
			break
	seed_text += " " + output_word
print(seed_text)

im feeling chills me to the bone tumbling been under down me mad and every showing sailing sailing life can would kind blue alive found think think dont dont lot never found am and friend girl me think think think do closed misunderstood never am again never chiquitita never andante never would had had had had stood slow lightly a christ kind realized dimension realized boomaboomerang love chills suffer and be cutting my lightly on ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma